Pixar's Movies
Posted on Dim 23 septembre 2018 in Data Analysis
Analyze Pixar's Movies¶
The dataset represent Pixar's movies that contains information on each movie's critics ratings, revenue figures, production costs, Oscar results, and more.
The goal is to analyze the Data set with Data Visualization.
In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [3]:
pixar_movies = pd.read_csv("PixarMovies.csv")
print(pixar_movies.shape[0])
print(pixar_movies.shape[1])
cols = pixar_movies.columns.values
for el in cols:
print(pixar_movies[el].dtypes)
pixar_movies.describe()
Out[3]:
In [4]:
pixar_movies.head(17)
Out[4]:
Data Cleaning¶
In [5]:
pixar_movies["Domestic %"] = pixar_movies["Domestic %"].str.rstrip('%').astype('float')
pixar_movies["International %"] = pixar_movies["International %"].str.rstrip('%').astype('float')
Change Scale of IMDB Score
In [6]:
pixar_movies["IMDB Score"] = pixar_movies["IMDB Score"] * 10
In [7]:
filtered_pixar = pixar_movies.dropna()
In [8]:
filtered_pixar.set_index("Movie", inplace = True)
pixar_movies.set_index("Movie", inplace = True)
pixar_movies.head()
Out[8]:
Data Visualization, Line Plots : compare scores of different review sites¶
In [9]:
critics_reviews = pixar_movies[["RT Score","IMDB Score","Metacritic Score"]]
critics_reviews.plot()
plt.show()
In [10]:
critics_reviews.plot(figsize = (10,6))
plt.show()
Data Visualization, Box Plot : how is distributed scores¶
In [12]:
critics_reviews.plot(kind = "box", figsize = (9,5))
Out[12]:
Data Visualization, Stacked Bar Plot : Revenue made in US and Internationally¶
In [16]:
revenue_proportions = filtered_pixar[["Domestic %","International %"]]
revenue_proportions.plot(kind = "bar", stacked = True)
Out[16]: